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ABSTRACT 

In  this  paper  we  describe  a  method  for 
smoothing  disjoint  formant  track  boundaries 
that  can  arise  in  packet  voice  communica¬ 
tion.  The  disjoint  boundaries  can  occur  when 
missing  packets  are  reconstructed  using  wave¬ 
form  substitution  methods.  Our  algorithm  at¬ 
tempts  to  match  formant  locations  to  LPC- 
poles  and  then  adjusts  the  pole  locations 
across  the  boundary  to  smooth  the  formants 
at  the  boundary.  We  describe  our  method  for 
matching  formants  to  poles  and  our  method 
for  adjusting  the  poles.  The  LPC  residual  is 
not  modified.  Time  domain  plots  and  LPC 
spectrograms  are  used  to  show  that  the  for¬ 
mant  track  was  smoothed.  Perceptually  sub¬ 
jective  listening  tests  confirm  that  an  im¬ 
provement  in  the  speech  quality  is  achieved 
with  this  method  in  that  it  sounds  more  clear 
and  natural. 

1.  INTRODUCTION 

Voice  packet  communication  refers  to  the 
speech  coding  technique  where  sampled 
speech  is  broken  up  into  packets,  encoded, 
transmitted  and  decoded  at  the  receiver.  Usu¬ 
ally  these  packets  are  8-48  msecs  in  duration. 
It  is  possible  that  the  receiver  misses  packets 
due  to  noise  bursts,  transmission  errors,  or 
congested  channels.  Therefore  the  need  arises 
to  reconstruct  the  missing  packets  from  the 
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surrounding  packets.  Many  techniques  have 
been  developed  to  compensate  for  the  miss¬ 
ing  packets  [1,  9,  5].  The  simplest  method 
involves  no  reconstruction  and  simply  inserts 
silence  for  any  missing  packets.  This  has  been 
shown  to  cause  noticeable  artifacts  in  the  re¬ 
ceived  speech,  and  is  disturbing  to  listeners. 
Another  solution  is  waveform  substitution,  in 
which  the  missing  packet  is  estimated  from 
the  previous  packet  (and  possibly  the  succeed¬ 
ing  packet).  Waveform  substitution  can  in¬ 
volve  pattern  matching,  pitch  waveform  repli¬ 
cation,  and  packet  merging  [4,  6,  10].  Some  of 
these  methods  try  to  smooth  the  time  discon¬ 
tinuity  at  the  boundaries  between  the  missing 
packet  and  the  two  it  adjoins. 

The  waveform  substitution  method  is  very 
efficient  and  works  well  as  long  as  the  speech 
is  relatively  static  during  the  missing  packet. 
However  if  significant  changes  occur  in  the 
speech  during  the  missing  packet  the  result¬ 
ing  speech  can  contain  noticeable  acoustical 
artifacts  and  can  lead  to  unnatural  sound¬ 
ing  speech.  We  develop  a  method  that 
can  improve  the  disjointness  that  arises  at 
the  boundaries  of  missing  packets  when  the 
speech  is  changing  quickly.  This  method  at¬ 
tempts  to  smooth  a  speakers  formants  across 
these  boundaries  by  warping  the  LPC  poles 
on  both  sides  of  the  boundary  toward  a  mid- 
p<'|)int.  This  will  insure  a  smooth  pole  transi¬ 
tion,  which,  if  the  poles  are  mapped  appropi- 


ately  to  the  speakers  formants,  will  result  in 
a  smooth  formant  track. 


2.  METHOD 

In  this  example  we  used  a  pitch  waveform 
replication  method  [4]  to  reproduce  the  miss¬ 
ing  packet  of  speech.  This  leads  to  an  abrupt 
formant  change,  as  seen  in  Figure  3,  which 
we  now  try  to  compensate  for  using  our  algo¬ 
rithm.  Given  two  short  speech  segments  si 
and  s2  on  either  side  of  the  disjoint  bound¬ 
ary  we  compute  LPC-9  coefl&cients  [8,  7,  2] 
on  windowed  speech  segments  producing  two 
LPC  matrices  SILPC  (M  x  Nl)  and  S2LPC 
(M  X  N2).  Here  M  is  the  LPC  order  and  Nl 
and  N2  are  the  number  of  frames  in  si  and  s2. 
Each  column  represents  the  LPC  vector  used 
for  that  frame  of  speech.  Also  the  residual 
matrices  used  for  synthesis  are  computed,  but 
these  are  not  altered.  Prom  the  LPC  matrices 
the  poles  are  calculated  as  the  LPC  polyno¬ 
mial  roots,  forming  the  two  matrices  SIPL 
and  S2PL  each  containing  the  same  number 
of  frames  as  its  corresponding  LPC  matrix. 

Next  the  poles  of  the  last  frame  of  si  (last 
column  of  SIPL)  are  paired  with  the  poles  of 
the  first  frame  of  s2  (first  column  of  S2PL) 
in  such  a  way  that  the  poles  correspond  to 
the  formant  locations.  This  is  done  by  first 
sorting  the  poles  in  each  frame  by  angle  then 
assigning  each  pole  as  a  formant  if  the  pole 
magnitude  is  greater  than  0.8.  This  magni¬ 
tude  was  found  empirically  and  has  worked 
well  for  finding  formants.  So  the  first  pole  in 
the  sorted  vector  that  has  magnitude  greater 
than  0.8  is  labeled  as  the  first  formant,  the 
second  pole  that  has  magnitude  greater  than 
0.8  is  labeled  as  the  second  formant,  and  so 
on.  The  algorithm  will  try  to  find  four  for¬ 
mants.  Any  pole  not  labeled  as  a  formant  is 
not  altered  in  the  warping  process.  And  if  two 
frames  that  are  matched  have  differing  num¬ 
bers  of  pole  formants  the  lower  number  is  used 
and  the  formants  not  mapped  are  not  altered. 

Once  the  poles  are  matched  to  formants 
a  mid-point  between  the  two  pole  vectors  pi 
and  p2,  where  pi  is  the  last  column  of  SIPL 


Warping  factor  as  a  function  of  frame  number 


Figure  1:  Percent  warp  of  poles  toward  mid¬ 
point  pole  vector  as  a  function  of  frame  num¬ 
ber. 


Speech  with  Missing  Formant  Transition 
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Figure  2:  Speech  waveforms  showing  wave¬ 
form  substitution  and  formant  smoothing. 


and  p2  is  the  first  column  of  S2PL^  is  calcu« 
lated  in  phase  and  magnitude  as 


for  each  pole  in  the  vector  that  corresponds 
to  a  formant.  Here  i  indicates  the  formant 
number.  Then  each  frame  of  poles  in  SlPL 
and  S2PL  is  recomputed  by  warping  them  to¬ 
wards  the  mid-point  poles  as  [3] 


pnewi^j  = 


where  y:=frame  number, 

y 

VJfl-0.5^ 

for  p  =  pi,  and 


fNl-^N2-iy 
V  i\r2-0.5  ) 

for  p  =  p2.  Here  pnew  is  the  new  pole  ma¬ 
trix  and  /  is  a  simple  scaler  value  that  deter¬ 
mines  the  shape  of  the  warping  factor  a.  The 
amount  of  warp  is  a  nonlinear  function  that 
depends  on  the  distance  the  frame  is  from  the 
boundary.  The  warping  function  for  /  =  2 
can  be  seen  in  Figure  1.  This  value  has  been 
determined  empirically  through  subjective  lis¬ 
tening  tests.  The  closer  the  frame  is  to  the 
boundary  the  more  it  is  warped  to  the  mid¬ 
point  poles  so  that  at  the  boundary  the  poles 
from  the  two  segments  will  have  been  com¬ 
pletely  warped  to  the  mid-point  pole  vector. 
In  Figure  1,  Nl=16,  N2=:24,  and  the  mid¬ 
point  occurs  at  the  boundary,  frame  number 
16,  where  the  warp  is  almost  1.  So,  e.g.,  frame 
10  will  be  warped  approximately  22%  towards 
the  mid-point  pole  vector. 

Once  all  the  poles  have  been  recomputed 
the  speech  is  resynthesized  by  computing  new 
LPC  coefficients  from  the  new  pole  matrix, 
and  resynthesizing  using  the  old  residual.  The 
final  frame  of  the  first  segment  and  the  first 
frame  of  the  second  segment  will  now  be  spec¬ 
trally  smooth  in  terms  of  a  smooth  pole  tran¬ 
sition.  This  wiU  correspond  to  a  smooth  for¬ 
mant  transition  if  the  poles  are  matched  cor- 


Figure  3:  LPC  spectrogram  of  disjoint  for¬ 
mants  caused  by  waveform  substitution. 
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Figure  4:  LPC  spectrogram  of  boundary  after 
pole  warping. 


3.  RESULTS 

Figure  2  shows  three  speech  segments.  The 
first  subplot  shows  a  speech  waveform  that 
was  put  through  a  voice  packet  communica¬ 
tion  system  where  a  packet  was  lost.  The 
next  subplot  shows  the  missing  packet  re¬ 
placed  with  pieces  from  its  surrounding  pack¬ 
ets  in  order  to  fill  the  hole.  Around  sample 
800  we  can  see  a  very  abrupt  change  in  the 
waveform.  It  is  clear  that  the  statistics  of  the 
waveform  changed  very  rapidly  at  this  point. 
This  is  due  to  the  fact  that  a  formant  change 
occurred  in  the  missing  packet. 

We  took  the  waveform  in  the  second  sub¬ 
plot  and  ran  it  through  our  algorithm  to 
smooth  the  formants.  The  results  can  be  seen 
in  the  third  subplot.  Note  that  the  transi¬ 
tion  region  is  now  much  smoother.  Perceptu¬ 
ally  the  waveform  in  the  third  subplot  sounds 
more  natural  and  clear  than  the  wavform  in 
the  second  subplot. 

The  smoothing  effects  of  our  algorithm 
can  be  seen  clearer  in  Figures  3  and  4.  Figure 
3  shows  an  expanded  view  of  an  LPC  spec¬ 
trogram  of  the  speech  waveform  of  Figure  2 
second  subplot.  We  use  LPC  spectrograms  in 
order  to  highlight  the  formant  structure.  In 
this  figure  we  can  clearly  see  where  the  abrupt 
formant  change  occurs.  This  formant  change 
has  an  unnatural  perceptual  sound  quality  to 
it.  Figure  4  shows  an  LPC  spectrogram  of  the 
speech  waveform  form  Figure  2  third  subplot. 
Here  we  can  see  that  the  formant  transition 
across  the  boundary  is  now  much  smoother, 
especially  in  the  second  formant  which  had 
the  most  severe  change. 

4.  CONCLUSION 

In  this  paper  we  have  developed  a  method  of 
smoothing  disjoint  formant  tracks  that  can 
occur  in  waveform  substituted  speech.  It 
should  be  noted  that  this  method  does  not  de¬ 
grade  the  performance  of  the  waveform  substi-  5 
tution  methods  when  the  speech  is  relatively 
static.  This  is  because  the  formants  will  be 
relatively  static  and  will  not  be  warped  a  sig¬ 


nificant  amount,  if  any.  A  good  preprocessing 
step  would  be  to  first  detect  a  disjoint  for¬ 
mant,  then  bypass  the  formant  smoothing  if 
the  formants  are  not  disjoint. 

The  only  disadvantage  to  this  method  is 
the  increased  processing  time  for  the  analy¬ 
sis  and  synthesis.  Since  only  the  boundaries, 
which  are  very  short  in  duration,  need  to  be 
smoothed  our  method  still  involves  much  less 
computation  than  a  full  synthesis  technique. 
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